eZ Component: Document, Design
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Design description
==================

ezcDocument interface
---------------------

Interface that defines an abstract document class. Classes that implement
this interface are called after the format's name, for example 'ezcDocumentHtml'.

These classes are able to generate document objects from a certain type of data
(text, DOM, XML, file) with static functions like 'createFromText()'. When the
object is created, it's possible to get it's content in any type of data that
is supported by this format using functions like 'getText()' or 'getXML()'.
If a certain given type of data is not supported by this format, it will throw
a error.

ezcDocConverter interface
-------------------------

Interface that defines an abstract conversion class. Real conversion classes
will implement conversion of the given document from one format to another.
The names of that classes follow the pattern: ezcDocFormat1ToFormat2, where
Format1 and Format2 are format names like 'Html' or 'Docbook", for example:
'ezcDocHtmlToDocbook' implements conversion from HTML to DocBook.

The main function of this interface 'convert()' takes a document in one format
and return it in another. Both argument and return value are objects of
'ezcDocument' implementation classes.

ezcDocParser class
------------------

Contains methods for text document parsing using a formal grammar. Exact formal
grammars and format-specific callback functions (if needed) are set in
format handling classes.

ezcDocXSLTTransformer class
---------------------------

A class based on ezcDocConverterBase for transforming DOM documents using special
rules and element callback handlers. Contains only methods for document
transformation. Exact rules and element handlers are set in derived classes.

ezcDocOutput class
------------------

Performs an output of the given document tree in the text format using simple
internal templating system. Also it cares about text indenting to show the
structure of the document. Exact templates for element output and helper
formatting functions are set in derived classes.

ezcDocOutputTemplate class
--------------------------

Implemented in the DocumentTemplateTieIn component. It extends ezcDocOutput
class for using Template component for elements output.

ezcDocValidator
---------------

Validates a document or a separate element against it's schema. This class uses
RelaxNG schema format as the input.


Algorithms
==========

Transforming XML
----------------

XML documents are transformed using XSLT stylesheets and XSL extension for PHP.
Transformations are done with ezcDocXSLTTransformer class.


Parsing text/XML
----------------

ezcDocParser class performs a parsing of the input text and presents
it as a DOM tree.

This is not an implementation of a real context-free parser.
There is an assumption that input language is XML-like, i.e. consists
of elements that have their opening and ending parts and some
content between them (that may contain another elements).
 
Sometimes it's hard or impossible to formalize input in these terms,
so some special algorithms or custom element handlers will be used
in this case.

Document output
---------------

ezcDocOutput class performs an output of the given document tree in the text
format using simple internal templating system. Also it cares about text
indenting to show the structure of the document.
 
Exact templates for element output and helper formatting functions are set
in derived classes. Templates are simple strings in which some character
or sequence is replaced with another string using str_replace.

ezcDocOutputTemplate class is implemented in the DocumentTemplateTieIn
component. It extends this class to use Template component for elements
output.

Validating documents
--------------------

ezcDocValidator is used to validate a document or a separate element
against it's schema.

This class uses RelaxNG schema format as the input, then transforms it
to the inner format for fast processing. The processed schema is stored
in cached .php file for faster access in the future.

The idea for fast validation is using regular expressions and strings.
Here is an example:
 
 <element name="elem1">
   <zeroOrMore>
     <element name="elem2">
       ...
     </element>
   </zeroOrMore>
   <element name="elem3">
       ...
   </element>
</element>

This RelaxNG schema for the element's content can be presented with regexp:

'#(elem2)*elem3#'

Validated document element's children can be also presented with a string,
like 'elem2elem2elem3' for instance, which is validated with this regexp.
 
The similar process used for attributes.


Examples
========

Converting Format1 to Format2
-----------------------------

$docFormat1 = new ezcDocumentText( $text, 'format1' );
$converter1 = new ezcDocFormat1ToInternal(  $parameters1 );
$docInternal = $converter1->convert( $docFormat1 );
$converter2 = new ezcDocInternalToFormat2( $parameters2 );
$docFormat2 = $converter2->convert( $docInternal );
$result = $docFormat2->getText();

/// The same with static functions:

$docFormat1 = new ezcDocumentText( $text, 'format1' );
$docInternal = ezcDocFormat1ToInternal::convert( $docFormat1, $parameters1 );
$docFormat2 = ezcDocInternalToFormat2::convert( $docInternal, $parameters2 );
$result = $docFormat2->getText();